Revealing Phonological Similarities between Related Languages from Automatically Generated Parallel Corpora
نویسنده
چکیده
In this paper1, we present an approach to automatically revealing phonological correspondences within historically related languages. We create two bilingual pronunciation dictionaries for the language pairs German-Dutch and GermanEnglish. The data is used for automatically learning phonological similarities between the two language pairs via EMbased clustering. We apply our models to predict from a phonological German word the phonemes of a Dutch and an English cognate. The similarity scores show that German and Dutch phonemes are more similar than German and English phonemes, which supplies statistical evidence of the common knowledge that German is more closely related to Dutch than to English. We assess our approach qualitatively, finding meaningful classes caused by historical sound changes. The classes can be used for language learning.
منابع مشابه
Revealing Phonological Similarities be
In this paper, we present an approach to automatically revealing phonological classes within historically related languages. A newly created bilingual German-Dutch pronunciation dictionary is used for learning phonological similarities between the onsets, nuclei and codas of these two languages via EM-based clustering. Our evaluation is twofold: we apply the models to predict from a German word...
متن کاملRevealing phonological similarities between German and dutch
In this paper, we present an approach to automatically revealing phonological classes within historically related languages. A newly created bilingual German-Dutch pronunciation dictionary is used for learning phonological similarities between the onsets, nuclei and codas of these two languages via EM-based clustering. Our evaluation is twofold: we apply the models to predict from a German word...
متن کاملAutomatic Generation of Bilingual Dictionaries Using Intermediary Languages and Comparable Corpora
This paper outlines a strategy to build new bilingual dictionaries from existing resources. The method is based on two main tasks: first, a new set of bilingual correspondences is generated from two available bilingual dictionaries. Second, the generated correspondences are validated by making use of a bilingual lexicon automatically extracted from non-parallel, and comparable corpora. The qual...
متن کاملAutomatic Dictionary Expansion Using Non-parallel Corpora
Automatically generating bilingual dictionaries from parallel, manually translated texts is a well established technique that works well in practice. However, parallel texts are a scarce resource. Therefore, it is desirable also to be able to generate dictionaries from pairs of comparable monolingual corpora. For most languages, such corpora are much easier to acquire, and often in considerably...
متن کاملGraphonological Levenshtein Edit Distance: Application for Automated Cognate Identification
This paper presents a methodology for calculating a modified Levenshtein edit distance between character strings, and applies it to the task of automated cognate identification from nonparallel (comparable) corpora. This task is an important stage in developing MT systems and bilingual dictionaries beyond the coverage of traditionally used aligned parallel corpora, which can be used for finding...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005